This is our evaluation of forecasting models and individual forecasters who have contributed forecasts of Covid-19 case and death numbers in Germany and Poland. These forecasts were submitted to the German and Polish Forecast Hub each week.
The evaluations are our own and not authorised by the German Forecast Hub team. We cannot rule out mistakes and the analyses are subject to change.
If you have questions or want to give feedback, please create an issue on our github repository
Here is an overall ranking of all forecasters. The ranking is made according to relative skill. Relative skill is calculated by looking at all pairwise comparisons between forecasters in terms of the weighted interval score (WIS). See below for a more detailed explanation of the scoring metrics used. ‘Overall’ shows the complete ranking, ‘latest’ only spans the last 5-6 weeks of data. ‘Detailed’ represents the full data set that you can download for your own analysis.
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
The following metrics are used:
This is a visualisation of all forecasts made so far.
This table shows you either your rank among all forecasters or the standardised rank. The standardised rank is computed as (100 - the forecaster percentile rank) among all forecasters for a given target and forecast date. What happens is basically this: Every forecaster gets assigned a rank (1 is the best and the worst equals the number of available forecasts for that date). This rank is then transformed to a scale from 1 to 100 such that 100 is best and 0 is worst. Ranks are determined based on the weighted interval scores.
## model rank
## standardised model rank
The weighted interval score can be decomposed into three parts: sharpness (the amount of uncertainty around the forecast), overprediction and underprediction. This visualisation gives an impression of the distribution between these three forms of penalties for the different forecasters.
The following graphic gives an overview of the forecasters and models analysed and the number of forecasts they contributed.
Most of the ‘models’ are human forecasters, but some are not: